Support deserializing dictionary encoded arrays with null values#308
Support deserializing dictionary encoded arrays with null values#308bennetthardwick wants to merge 1 commit into
Conversation
Arrow allows dictionary encoded arrays to have null values. These values are not specified as null in the validity bitmap but are known as "logical nulls" and should still be supported. Currently serde_arrow will return an error when attempting to deserialize a dictionary encoded array with null values. This change adds support for them and allows them to deserialize. Fixes chmp#307.
|
|
||
| #[test] | ||
| fn some_null_values_v2() { | ||
| fn some_null_values_out_of_range() { |
There was a problem hiding this comment.
I'm not sure this test is really required anymore. It seems to be testing that if the dictionary contains a null but isn't used, it still errors out.
There was a problem hiding this comment.
Good question. Not sure what I was thinking here. I would probably remove the test.
chmp
left a comment
There was a problem hiding this comment.
Really cool. Thanks a lot.
Can you maybe expand the test coverage slightly and update the Changes.md?
|
|
||
| #[test] | ||
| fn some_null_values_v2() { | ||
| fn some_null_values_out_of_range() { |
There was a problem hiding this comment.
Good question. Not sure what I was thinking here. I would probably remove the test.
|
|
||
| #[test] | ||
| fn all_null_values() { | ||
| let array = Array::Dictionary(DictionaryArray { |
There was a problem hiding this comment.
Can you also add a test with nullable keys? (which is supported by your impl and the format, if I read it correctly)
There was a problem hiding this comment.
I can't seem to push into your branch, otherwise I would've added the test myself, that's what I would've done:
#[test]
fn some_null_keys() {
let array = Array::Dictionary(DictionaryArray {
keys: Box::new(to_array(DataType::Int8, true, [Some(1), None, Some(0)])),
values: Box::new(to_array(DataType::Utf8, false, ["foo", "bar"])),
});
let deserializer =
ArrayDeserializer::new(String::from("$"), None, array.as_view()).unwrap();
assert_eq!(
Option::<String>::deserialize(deserializer.at(0)).unwrap(),
Some(String::from("bar")),
);
assert_eq!(
Option::<String>::deserialize(deserializer.at(1)).unwrap(),
None
);
assert_eq!(
Option::<String>::deserialize(deserializer.at(2)).unwrap(),
Some(String::from("foo")),
);
}There was a problem hiding this comment.
Thank you! I'm going to re-open this from a personal fork I think.
Arrow allows dictionary encoded arrays to have null values. These values are not specified as null in the validity bitmap but are known as "logical nulls" and should still be supported.
Currently serde_arrow will return an error when attempting to deserialize a dictionary encoded array with null values.
This change adds support for them and allows them to deserialize.
Fixes #307.